Typical Depth of a Digital Search Tree built on a general source

نویسندگان

Kanal Hun

Brigitte Vallée

چکیده

The digital search tree (dst) plays a central role in compression algorithms, of Lempel-Ziv type. This important structure can be viewed as a mixing of a digital structure (the trie) with a binary search tree. Its probabilistic analysis is thus involved, even in the case when the text is produced by a simple source (a memoryless source, or a Markov chain). After the seminal paper of Flajolet and Sedgewick (1986) [11] which deals with the memoryless unbiased case, many papers, due to Drmota, Jacquet, Louchard, Prodinger, Szpankowski, Tang, published between 1990 and 2005, dealt with general memoryless sources or Markov chains, and perform the analysis of the main parameters of dst’s–namely, internal path length, profile, typical depth– (see for instance [7, 15, 14]). Here, we are interested in a more realistic analysis, when the words are emitted by a general source, where the emission of symbols may depend on the whole previous history. There exist previous analyses of text algorithms or digital structures that have been performed for general sources, for instance for tries ([3, 2]), or for basic sorting and searching algorithms ([22, 4]). However, the case of digital search trees has not yet been considered, and this is the main subject of the paper. The idea of this study is due to Philippe Flajolet and the first steps of the work were performed with him, during the end of 2010.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Probabilistic analysis of the asymmetric digital search trees

In this paper, by applying three functional operators the previous results on the (Poisson) variance of the external profile in digital search trees will be improved. We study the profile built over $n$ binary strings generated by a memoryless source with unequal probabilities of symbols and use a combinatorial approach for studying the Poissonized variance, since the probability distribution o...

متن کامل

Compact Suffix Trees Resemble PATRICIA Tries: Limiting Distribution of the Depth

Suffix trees are the most frequently used data structures in algorithms on words. In this paper, we consider the depth of a compact suffix tree, also known as the PAT tree, under some simple probabilistic assumptions. For a biased memoryless source, we prove that the limiting distribution for the depth in a PAT tree is the same as the limiting distribution for the depth in a PATRICIA trie, even...

متن کامل

Average Profile and Limiting Distribution for a Phrasesize In

Consider the parsing algorithm due to Lempel and Ziv that partitions a sequence of length n into variable phrases (blocks) such that a new block is the shortest substring not seen in the past as a phrase. In practice the following parameters are of interest: number of phrases, the size of a phrase, the number of phrases of given size, and so forth. In this paper, we focus on the size of a rando...

متن کامل

The expected profile of digital search trees

A digital search tree (DST) is a fundamental data structure on words that finds various applications from the popular Lempel-Ziv’78 data compression scheme to distributed hash tables. The profile of a DST measures the number of nodes at the same distance from the root; it depends on the number of stored strings and the distance from the root. Most parameters of DST (e.g., depth, height, fillup)...

متن کامل